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Chromosome 2 of Plasmodium falciparum was sequenced; this sequence con- 
tains 947,103 base pairs and encodes Z10 predicted genes. In comparison vvith 
the Saccharomyces cerevisiae genome, chromosome 2 has a lower gene density, 
introns are more frequent, and proteins are markedly enriched in nonglobular 
domains. A family of surface proteins, rifins, that may play a role in antigenic 
variation was identified. The complete sequencing of chromosome 2 has shown 
that sequencing of the A+T-rich P. falciparum genome is technically feasible. 



Malaria, a disease caused by protozoan par- 
asites of the genus Plasmodium, is one of the 
most dangerous infectious diseases affecting 
human populations. Approximately 300 mil- 
lion to 500 million people are infected annu- 
ally, and 1.5 million to 2.7 million lives are 
lost to malaria each year, with most deaths 
occurring among children in sub-Saharan Af- 
rica (/). Of the four species that cause malaria 
in humans, P. falciparum is the greatest cause 
of morbidity and mortality. The resistance of 
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the malaria parasite to drugs and the resis- 
tance of mosquitoes to insecticides have re- 
sulted in a resurgence of malaria in many 
parts of the worid and a pressing need for 
vaccines and new drugs. The identification of 
new targets for vaccine and drug develop- 
ment is dependent on the expansion of our 
understanding of parasite biology; this under- 
standing is hampered by the complexity of 
the parasite life cycle. The sequencing of the 
Plasmodium genome may circumvent many 
of these difficulties and rapidly increase our 
knowledge about these parasites. 

The P. falciparum genome is —30 Mb in 
size; has a base composition of 82% A-l-T; 
and contains 14 chromosomes, which range 
from 0.65 to 3.4 Mb. Chromosomes from 
different wild isolates exhibit extensive size 
polymorphism. Mapping studies have indi- 
cated that the chromosomes contain central 
domains that are conserved between isolates 
and polymorphic subtelomeric domains that 
contain repeated sequences. P. falciparum 
also contains two organellar genomes. The 
mitochondrial genome is a 5.9-kb, tandemly 
repeated DNA molecule; a 35-kb circular 
DNA molecule, which encodes genes that are 
usually associated with plastid genomes, is 
located within the apicoplast [an organelle of 
uncertain function in Plasmodium and the 
related parasite Toxoplasma (2)]. 

Chromosome 2 (GenBank accession num- 
ber AE001362) was sequenced with the shot- 
gun sequencing approach, which was previ- 
ously used to sequence several microbial ge- 
nomes (i, 4), with modifications to compen- 
sate for the A-l-T richness of P. falciparum 
DNA (5). These modifications included the 



following: the extraction of DNA from aga- 
rose under high-salt conditions to prevent the 
DNA from melting at a high temperature, the 
avoidance of ultraviolet (UV) light, the use of 
the "vector plus insert" protocol for library 
construction, sequencing with dye-terminator 
chemistry, the use of a reduced extension tem- 
perature in polymerase chain reactions (PCRs), 
and the use of a transposon-insertion method 
for the closure of gaps that are very rich in AT. 
The assembly software was also modified to 
minimize the misassembly of A-l-T-rich se- 
quences. The complete sequence included por- 
tions of both telomeres and had an average 
redundancy of 1 1 -fold; colinearity of the final 
sequence and genomic DNA was proven with 
optical restriction and yeast artificial chromo- 
some (YAC) maps. 

Chromosome 2 of P. falciparum (clone 
3D7) is 947 kb in length and has an overall 
base composition of 80.2% A + T. The 
chromosome contains a large central region 
that encodes single-copy genes and several 
duplicated genes, subtelomeric regions that 
contain variant antigen genes (var) (6-8), 
repetitive interspersed family (RIF)-l ele- 
ments (9) and other repeats, and typical 
eukaryotic telomeres (Fig. I). The terminal 
23-kb portions of the chromosome are non- 
coding and exhibit 77% identity in opposite 
orientations. The left and right telomeres 
consist of tandem repeats of the sequence 
TT(TC)AGGG {JO) and total 1141 and 551 
nucleotides (nt), respectively. The subtelo- 
meric regions do not exhibit repeat oli- 
gomers until -12 to 20 kb into the chro- 
mosome, where rep20 {11) (a 2 1 -bp tandem 
direct repeat found exclusively in these 
regions) occurs 134 and 96 times in the left 
and right ends of the chromosome, respec- 
tively. The sequence similarity that was 
observed between the subtelomeric regions 
supports previous suggestions that recom- 
bination between chromosome ends may be 
one mechanism by which genetic diversity 
is generated. A region with centromere 
functions could not be identified on the 
basis of sequence similarity to S. cerevisiae 
or other eukaryotic centromeres {12). How- 
ever, several regions of up to 12 kb are 
devoid of large open reading frames 
(ORFs) and might contain the centromere. 
Alternatively, centromeric functions may 
be defined by higher order DNA structures 
and chromatin-associated protein complex- 
es {13). 

Two hundred and nine protein-encodmg 
genes and a gene for tRNA*^'" (Fig. 1 and 
Table I) were predicted {14) on chromosome 
2, giving a gene density of one gene per 4.5 
kb, which is a value between that observed in 
yeast (one gene per 2 kb) and in Caenorhab- 
ditis etegans (one gene per 7 kb). Of the 209 
protein-encoding genes, 43% contain at least 
one intron. This percentage is an estimate 
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Structure. Genes are color-coded according to broad role categories as shown in the key. 



because some introns may have been missed 
by the gene-fmding method. Most spliced 
genes consist of two or three exons. In terms 
of intron content and gene density, the Plas- 
modium genome, which was assessed by the 
analysis of the first completed chromosome 
sequence, appears to be intermediate between 
the condensed yeast genome and the intron- 
rich genomes of multicellular eukaryotes. 

The proteins encoded in chromosome 2 
(Table 2) fall into the following three cate- 
gories: (i) 72 proteins (34%) are conserved in 
other genera and contain one or more distinct 
globular domains; (ii) 47 proteins (23%) be- 
long to Plasmodium-spccxTxc families with 
identifiable structural features and, in some 
cases, known fiinctions; and (iii) 90 predicted 
proteins (43%) have no detectable homologs, 
although many contain structural features 
such as signal peptides and transmembrane 
domains. Homologs outside Plasmodium 
were detected for 87 (42%) of the 209 pre- 
dicted proteins. These include proteins in the 
first category, in addition to those proteins in 
the second category that possess a conserved 
domain or domains that are arranged in a 
manner unique to Plasmodium. The percent- 
age of evolutionarily conserved proteins is 
about two times lower than that found for 
other genomes, mainly because most of the 
remaining proteins were predicted to consist 
primarily of nonglobular domains (75) (Table 
1 ). The abundance of nonglobular domains in 
Plasmodium proteins is very unusual; the 
proportion of proteins with predicted large 
nonglobular domains in other eukaryotes, 
such as 5. cerevisiae (Table 1) or C. elegans 
{16), is approximately half that observed in 
Plasmodium, Furthermore, 1 3 of the 87 con- 
served proteins on chromosome 2 appear to 
contain large nonglobular structures (>30 
amino acids) that are inserted directly into 
globular domains, as determined by align- 
ment with homologs from other species. 

To determine whether nonglobular do- 
mains and proteins are expressed in P. falci- 
parum, we performed a reverse transcriptase 
(RT)-PCR on 1 1 nonglobular domains and 
on two genes that encoded predominantly 
nonglobular proteins, using total blood-stage 
RNA as a template. In all cases, RT-PCR 
products were the same size as those that 
were amplified from genomic DNA, and the 
sequence of RT-PCR products matched the 
genomic DNA sequence (77). Thus, it is like- 
ly that most, if not all, predicted nonglobular 
domains in chromosome 2 genes are ex- 
pressed. One example of the insertion of a 
nonglobular domain into a well-defined glob- 
ular domain is seen in a protein containing a 
5'-3' exonuclease (Fig. 2). The alignment of 
the Plasmodium sequence with four bacterial 
exonucleases revealed a 176 -amino acid in- 
sertion in a region between a strand and a 
helix in the three-dimensional structure of 



REPORTS 

this protein {18). This suggests that eukary- 
otic proteins can accommodate inserts that 
may be excluded from the protein core fold- 
ing without impairing the protein function. 
The propagation of nonglobular domains in 
Plasmodium suggests that such proteins pro- 
vide specific selective advantages to the par- 
asite. A structural analysis of Plasmodium 
proteins that contain nonglobular inserts may 
be valuable for understanding the general 
principles of protein folding. 

Of the 87 conserved proteins that are en- 
coded on chromosome 2, 71 (83%) show the 
greatest similarity to eukaryotic homologs 
(Table 2). In contrast, the remaining 1 6 pro- 
teins are most similar to bacterial proteins, 
and 4 of these represent the first eukaryotic 
members of protein families that have previ- 
ously been seen only in bacteria. At least 
some of these 1 6 genes may have been trans- 
ferred to the nuclear genome from an or- 
ganellar genome after the divergence of the 
phylum Apicomplexa from other eukaryotic 
lineages. Several of these proteins appear to 
contain NHj-terminal organellar import pep- 
tides {19) and may function within the apico- 
plast or the mitochondrion. One such gene 
encodes 3-ketoacyl-acyl carrier protein 
(ACP) synthase III (FabH), which catalyzes 
the condensation of acetyl-coenzyme A and 
malonyl-ACP in type II (dissociated) fatty 
acid synthase systems. Type II synthase sys- 
tems are restricted to bacteria and the plastids 
of plants, confirming previous hypotheses 
that the Plasmodium apicoplast contains met- 
abolic pathways that are distinct from those 
of the host (20. 27). 

Because the phylum Apicomplexa repre- 
sents a deep branch in the eukaryotic tree, the 



presence of eukaryotic-specific genes in P. 
falciparum suggests the appearance of these 
genes early in eukaryotic evolution. Most of 
these genes code for proteins that are in- 
volved in DMA replication, repair, transcrip- 
tion, or translation (Table 2) and include the 
origin recognition complex subunit 5, exci- 
sion repair proteins ERCCl and RAD2, and 
proteins involved in chromatin dynamics 
(such as the BRAHMA helicase, an ortholog 
of the DRING protein containing the RING 
finger domain, and chromatin protein 
SNWl). Furthermore, several eukaryotic pro- 
teins involved in secretion are encoded in 
chromosome 2 (such as the SEC61 -y subunit, 
the coated pit coatamer subunit, and syn- 
taxin), suggesting an early emergence of the 
eukaryotic secretory system. 

Proteins of the DnaJ superfamily act as 
cofactors for HSP70-type molecular chaper- 
ones and participate in protein folding and 
trafficking, complex assembly, organelle bio- 
genesis, and initiation of translation (22). 
Five proteins containing DnaJ domains are 
present on chromosome 2, which suggests 
multiple roles for this domain in the Plasmo- 
dium life cycle. Two of these proteins consist 
primarily of the DnaJ domain, whereas three 
of the five proteins also contain a large non- 
globular domain. Several proteins containing 
a DnaJ domain have been detected on other 
chromosomes, indicating that this is a large 
gene family in Plasmodium {23). One of its 
members, the ring-infected erythrocyte sur- 
face antigen, binds to the cytoplasmic side of 
the erythrocyte membrane, suggesting that 
DnaJ domains perform chape rone- tike func- 
• tions in the formation of protein complexes at 
this location {24). DnaJ domains in some P. 



Table 1. Summary of features of P. falciparum chromosome 2 (P. /. chr 2) and con.par.son to S cerew^ae 
chromosome 3 (S. c. chr 3). Protein structural features were predicted as described [14). ND not 
dete?^?ined Numbers in parentheses Indicate the percentage of the total genes or protems w.th the 
specified properties. 



Number 



Description 



P. /. chr 2 S. c. chr 3 



Chromosome length (kb) 
Percent C+C content 

Exons 

introns 
Kilobases per gene 

Number of predicted protein-coding regions 
Number of genes with Introns (%) 
tRNA genes 

Class of proteins 

Total 

Secreted (%) 

Integral membrane (%) ^ j • tQ^\ 

Integral membrane with multiple predicted transmembrane domains [%) 
Containing coiled-coil domains (%) 

Containing other large compositionally biased regions with predicted 

nonglobular structure (%) 
Completely nonglobular (%) 

With detectable homologs in other species 



945 


315 


19.7 


38.6 


243 


40.0 


13.3 


ND 


4.50 


1.73 


209 


171 


90 (43) 


4(2.2) 


1 


10 


209 


171 


22(11) 


11(6) 


90 (43) 


42 (24) 


27(13) 


21 (12) 


111(53) 


32(19) 


155 (74) 


71 (41) 


17(8) 


6(3.5) 


87(42) 


145 (85) 
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Table Z Identification of genes on P. falciparum chromosome 2. The 
PF number is the systematic name assigned according to a method adapted from 
S. cerevisiae (74). The description contains the name (if known) and prominent 
features of the gene. The table includes genes with homologs in other species and 
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members of Plasmodium gene famiUes. An expanded version of this table with 
additional information is available on the Wortd Wide Web at www.tigr.org/tab/ 
mdb/pfdb/pfdb.htmL Prt. protein; OO. organellar origin; TP, transit peptide; ATP. 
adenosine triphosphate; euk., eukaryotic; nt nucleotide. 



PF number 



Description 



PF 
number 



Description 



Amino acid biosynthesis 

PFB0200C Aspartate aminotransferase 
Biosynthesis of cofactors. prosthetic groups, and carriers 

PFB0130W Prenyl transferase 

PFB022OW Ubiquinone biosynthesis methyltransferase 
Fatty acid and phospholipid metabolism 
PFB0385W Acyl-carrier prt 

PFB0410C Phospholipase A2-lilte a/b fold hydrolase 

PFB0505C 3-ketoacyl carrier prt synthase ttl. FabH (OO, TP) 
PFB0685C ATP-dependent acyl-CoA synthetase (TP) 
PFB069SC ATP-dependent acyl-CoA synthetase (TP) 



Purines, pyrimidines, nucleosides, and nucleotides 
Adenylosuccinate lyase (OO) 



PFB0295W 
DNA metabolism 
PFB0160W 
PFB0180W 
PFB0205C 
PFB0265C 
PFB0440C 
PFB0720C 
PFB0730W 
PFB0840W 
PFB0875C 
PFB0895C 
Energy metabolism 
PFB0795W 
PFB0880W 
Transcription 

PFB0140W 

PFB0175C 

PFB0215C 

PFB0245C 

PFB0255W 

PFB0290C 

PFB0370C 

PFB0445C 

PFB0620W 

PFB0715W 

PFB0725C 

PFB0855C 

PFB0860C 

PFB0865W 

PFB0890C 



ERCCI-like excision repair prt 
Prt with 5 '-3' exonuclease domain (OO, TP) 
Prt with S'-3' exonuclease domain (Kem-1 family) 
RAD2 endonuclease 

Chromatinic RING finger prt. DRING ortholog 
Origin recognition complex subunit 5 (ATPase) 
BRAHMA ortholog (DNA helicase superfamily II) 
Replication factor C, 40-kDa subunit (replication activator) 
Chromatin-binding prt (SKI/SNW family) 
Replication factor C. 1 40-kDa subunit (ATPase) 

I 

ATP synthase alpha chain 
FAD-dependent oxidoreductase (OO) 



Metal-binding prt (DHHC domain) 
Prt of the MAK16 family 
Prt with Egl-like 3' -5' exonuclease domain 
RNA polymerase 16-kD subunit. RPB4-like 
RRM-type RNA-binding prt 
Zn-ribbon transcription factor (TFIIS family) 
RNA-binding prt (KH domain) 
elF-4A-like DEAD family RNA helicase 
YOUZ-tike small euk. C2C2 Zn finger prt 
DNA-directed RNA polymerase subunit 2 
Meta-binding prt (DHHC domain) 
rRNA methylase (SpoU family) (OO. TP) 
RNA helicase 

Small nuclear ribonucleoprt. (SNRNP family) 
Pseudouridine synthetase (RsuA family): first euk. member 
(OO) 

Translation and post-translational modification 
PFB0165W tRNA-Glu 

PFB0240W PINT domain prt (proteasomal subunit) 

PFB0Z6OW PSD2-like 26S proteasomal subunit 

PFB0325C SERA antigen/protease with active Cys 

PFB0330C SERA antigen/protease with active Cys 

PFB0335C SERA antigen/protease with active Cys 

PFB0340C SERA antigen/protease with active Ser 

PFB0345C SERA antigen/protease with active Ser 

PFB035OC SERA antigen/protease with active Ser 

PFB0355C SERA antigen/protease with active Ser 

PFB0360C SERA antigen/protease with active Ser 

PFB0380C phosphatase (acid phosphatase family) 

PFB0390W Ribosome releasing factor (OO, TP) 

PFB0455W Ribosomal prt L37A 

PFBOSISw Glycosyl transferase (novel euk. family) 

PFB0525W Asparaginyl-tRNA synthetase (OO, TP) 

PFB0545C Ribosomal prt L7/L 12 (OO) 

PFBOSSOw Euk. peptide chain release factor 

PFB0585W Leu/Phe-tRNA prt transferase, first euk. member (OO) 

PFB0645C Ribosomal prt LI 3 (OO) 

PFB0830W Ribosomal prt S26 

PFB0885W Ribosomal prt 330 



Regulatory functions 

PFB01 50c Ser/Thr prt kinase 

PFBOSlOw GAF domain prt (cyclic nt signal transduction) 

PFB0520W Novel prt kinase 

PFB0605W Ser/Thr prt kinase 

PFB0665W Ser/Thr prt kinase 

PFBOSISw Calcium-dependent prt kinase (C-terminus EF hand) 
Transport 

PFB0210C Monosaccharide transporter 

PFB0275W Membrane transporter 

PFB0435C Predicted amino transporter 

PFB0465C Membrane transporter 
Cell surface 

PFBOOlOw var gene 

PFBOOISc Rifin 

PFB0020C var gene fragment 

PFB0025C Rifin 

PFB0030C Rifin 

PFB0035C Rifin 

PFB0040C Rifin 

PFB0045C var gene fragment 

PFBOOSOc Rifin pseudogene 

PFBOOSSc Rifin 

PFBOOeOw Rifin 

PFB0065W Rifin 

PFBOlOOc Knob-associated His-rich prt 

PFB0300C Merozoite surface antigen MSP-2 

PFB0305C Merozoite surface antigen MSP-5 (ECF domain) 

PFB0310C Merozoite surface antigen MSP-4 (EOF domain) 

PFB0400W PfS230 paralog (predicted secreted prt) 

PFB0405W Transmission-blocking target antigen PfS230 

PFB0570W Predicted secreted prt (thrombospondin domain) 

PFB0760W Mtn3/RAC1IP-Iike prt 

PFB0915W RESA-H3 antigen 

PFB0955W Rifin 

PFB0975C var gene fragment 

PFBIOOOw Rifin pseudogene 

PFBIOOSw Rifin 

PFBIOlOw Rifin 

PFBlOISw Rifin 

PFB1020W Rifin 

PFBn02Sw var gene fragment 

PFB1030W var gene fragment 

PFB103SW Rifin 

PFB1040W Rifin 

PFB1045W var gene fragment 

PFBIOSOw Rifin 

PFB 1055c var gene 

Other cellular processes 

PFBOOSSc Prt with DnaJ domain (RESA-like) 

PFB0090C Prt with DnaJ domain 

PFB0450W Prt translocation complex. SEC61 y chain 

PFB0480W Syntaxin 

PFB0500C RAB GTPase 

PFB0595W Prt with DnaJ domain, DNJ1/SIS1 family 

PFB0635W T-complex prt 1 {HSP60 fold superfamily) 

PFB0640C WEB-1 ortholog, WD40 

PFB075OW VPS45-like prt (STXBP/UNC-18/SEC1 family) 

PFB0S05C Clathrin coat assembly prt 

PFB0920W Prt with DnaJ domain (RESA-like) 

PFB0925W Prt with DnaJ domain (RESA-like) 
Unknown function 

PFB0270W SLR1419 family prt (OO) 

PFB0320C HesB family prt (possible redox activity, OO, TP) 

PFB0420W YgdB prt first euk. member (OO. TP) 

PFB0425C YMR7 family prt 
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PFB0180W 

DPO 1_TH EAQ_1 18828 
5' -3-exo^ae_298396S 
DP01_BACC A_4 16913 
DP01_ECOLI_l 18825 
consensus/ 100% 
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Fig. 2. Multiple alignment of the predicted 5'-3' exonudease 
(PFBOlSOw) encoded in chromosonne 2 with homologous bacterial exo- 
nudease domains showing the large nonglobular insert in Plasmodium. 
The alignment was constructed with the profile alignment option of 
CLUSTALW (34). The alignment column shading is based on a 100% 
consensus, which is shown underneath the alignment; h indicates hydro- 
phobic residues (A, C. F. I, L, M. V, W. and Y). u indicates "tiny" residues 
(C A, and S). o indicates hydroxy residues (S and T ), c indicates charged 



residues (D. E, K, R. and H). and + indicates positively charged residues 
{K and R) (35). The aspartates involved in metal coordination have a red 
background and Inverse type. Secondary structure elements derived from 
the crystal structure of Thermus aquaticus DNA polymerase (78) are 
shown above the alignment (H indicates a helix, and E indicates extend- 
ed conformation, or p strand). 5'-3'-exo_Aae is a stand-alone exonude- 
ase from Aquifex aeoUcus. and the remaining bacterial sequences are the 
NH^-terminal domains of DNA polymerase I. 



falciparum proteins contain substitutions in 
the His-Pro-Asp signature that is required for 
interaction with HSP-70-type proteins, 
which may indicate a modification of the 
typical chaperone function. 

Chromosome 2 contains five protein 
families that are unique to Plasmodium in 
terms of their distinct domain organization, 
although three of them contain domains 
that are conserved in other genera. The 
genes encoding the Flasmodium-specific 
families are primarily located near the ends 
of the chromosome. A single var gene was 
identified in each subtelomeric region. The 
var genes encode large transmembrane pro- 
teins (PfEMPl) expressed in knobs on the 
surface of schizont-infected red cells. 
PfEMPl proteins exhibit extensive se- 
quence diversity; are clonally variant; and 
are involved in antigenic variation, cytoad- 
herence, and resetting {6-8). In addition to 
the full-length var genes, six small ORFs 
were identified in the subtelomeric regions 
that were similar to var sequences. Five of 
these ORFs resembled the var exon II 
cDNAs or the Pf60.1 sequences that were 
reported previously (7, 25). 

The largest Plasmodium-specific family 
found on chromosome 2 encodes proteins 
that were dubbed rifms, after the RIF-1 re- 
petitive element. RIF-1 contained a 1-kb 



ORF but no initiation codon, was found on 
most chromosomes, and was transcribed in 
late blood-stage parasites (9). The function of 
the RIF-1 element was unknown. Eighteen 
ORFs with similarities to RIF-1 were found 
in the subtelomeric regions of chromosome 2, 
centromeric to the var genes. An inspection 
of the sequence upstream of these ORFs re- 
vealed exons encoding signal peptides, which 
indicated that the RIF-1 elements were actu- 
ally genes consisting of two exons. These 
genes encode potential transmembrane pro- 
teins of 27 to 35 kD, with an extracellular 
domain that contains conserved Cys residues 
that might participate in disulfide bonding, a 
transmembrane segment, and a short basic 
COOH-terminus. The extracellular domain 
also contains a highly variable region (Fig. 
3). RT-PCR with schizont RNA showed that 
one of six rifin genes that were tested was 
transcribed. The function of the rifins is un- 
known, but their sequence diversity, predict- 
ed cell surface localization, and expression in 
schizont stages suggest that, like var genes, 
they may be clonally-variant. Multiple rifm 
genes were detected in the telomeric regions 
of chromosomes 3 and 14, suggesting that 
rifin genes have propagated as clusters in the 
course of Plasmodium evolution (26). If the 
number found on chromosome 2 is represen- 
tative of other chromosomes, there may be 



500 or more rifin genes in the P. falciparum 
genome (—7% of all protein-coding genes), 
making it the most abundant gene family in 
this organism. The presence of var and rifin 
genes and other ORFs in subtelomeric re- 
gions of P. falciparum chromosomes con- 
firms that the subtelomeric regions are not 
transcriptionally silent (27). 

Another family of membrane-associated 
proteins, serine repeat antigens (SERAs), 
contains a papain protease-like domain. A 
cluster of three SERA genes, which were all 
transcribed in the same direction (from cen- 
tromere to telomere), was known to be on 
chromosome 2 (2<S); at least one SERA has 
been evaluated for use in blood-stage vac- 
cines. These genes are part of an eight-gene 
cluster; seven genes have a similar four-exon 
structure, but the gene at the 3' end of the 
cluster contains only three exons. The pro- 
tease domains in these proteins are unusual 
because five of the eight contain serine in- 
stead of cysteine in the active nucleophile 
position, suggesting that they are serine pro- 
teases with a structure that is typical of cys- 
teine proteases (29). 

Two proteins (MSP-4 and MSP-5) that 
contain an epidermal growth factor (EOF) 
module in their extracellular domains were 
identified (50, 3 J). In organisms that are not 
classified in the animal kingdom, MSP-4, 
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wv««KEGAARVIQCKQFSTQETIKVAVTSIVSDAENVAAAAEQQATKDAIKMTIAVDSK^ JjL^SJGSJ^riitVWilYWWtlfll^X^ 

W<E GIANEPTC-CPVKTFSOMAVDAAEAAGKVSKTTEEAGIALANNTS ssL^VjillMf WIILIM^^^^^ 

^Jvs HHGESALSKRAAGIADYAADMAKITEEGVI^EGASAT f S^JJi^J^KiSJiviv^^^ 
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Fie 3 Multiple sequence alignment of rifins encoded on chromosome 2. 
The predicted coding regions were aligned with CLUSTALW [34] using 
the default settings. The alignment column shading is based on a 95% 
consensus which is shown underneath the alignment; h indicates hydro- 



phobic residues (A, C, F. I, U M, V. W. and Y). p indicates polar residues 
(D. E. H. K, N, Q, R, S. and T), b indicates "big residues (F, I U V W Y 
K R Q and E), and + indicates positively charged residues (K and RJ [35). 
The 'cysteines conserved in subsets of rifins are shown by Inverse type. 



MSP-5, and MSP-1 (a multi-EGF domain 
protein encoded on chromosome 3) and two 
Plasmodium sexual-stage antigens (32) are 
the only proteins that contain EGF repeats, 
which suggests that Plasmodium obtained the 
sequence for this domain from its animal 
host. The plasmodial EGF domains may be 
involved in parasite adhesion to host cells. 

In addition to the families of Plasmodium- 
specific proteins, chromosome 2 contains 
genes for many secreted and membrane pro- 
teins. One of these genes encodes a protein 
with a modified thrombospondin domain and 
was transcribed in blood-stage parasites (7 7). 
Other Plasmodium proteins containing 
thrombospondin domains, such as sporozoite 
surface protein 2/TRAP and circumsporozo- 
ite protein, are involved in the parasitic inva- 



sion of host cells (53), suggesting that this 
protein may be involved in the binding of 
infected red cells to host-cell ligands. 

Determination of the first P. falciparum 
chromosome sequence demonstrates that the 
A+T richness of P. falciparum DNA will not 
prevent the sequencing of the genome. Al- 
though technical difficulties not observed 
during the sequencing of other microbial ge- 
nomes were encountered, solutions to these 
problems were found that will facilitate se- 
quencing of the remaining chromosomes. 
The genome sequence should be of value in 
the study of Plasmodium biology and in the 
development of new drugs and vaccines for 
the treatment and prevention of malaria. In 
addition to these practical benefits, the Plas- 
modium genome sequence should provide 



broader biological insights, particularly in re- 
gard to the plasticity of the eukaryotic ge- 
nome that is manifest in the preponderance of 
the predicted nonglobular domains in plas- 
modial proteins. 
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